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ABSTRACT 



This thesis describes the design and implement of a prototype compiler for the 
functional programming language PHI. The design is highly modularized and the authors 
think this should facilitate the understanding of both concept and implementation. The 
front-end of the compiler implements machine independent lexical and syntactic analyzers; 
top-down parsing techniques are employed. The back-end implements a machine 
dependent one-pass semantic analyzer and code generator. 

Since this implementation is a prototype, it does not possess all of the qualities 
desirable in a full implementation. The basic constructs of PHI: functions and data 
definitions are implemented, as well as the integer, natural number, and boolean types. 
However, the necessary hooks are present and the design is mature enough to allow 
expanding the prototype to a full implementation. 
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I. INTRODUCTION 



A. BACKGROUND — GENERAL 

In its attempt to provide students with a well rounded background to the field of 
computer science, the computer science department at the Naval Postgraduate School offers 
courses covering recent developments in programming languages. One of the courses 
deals specifically with the methodology of functional, also known as applicative, 
programming. Both the theory and the practice of functional programming are covered, 
concentrating more on the practice than the theory. In order to fully appreciate the nuances 
of functional programming it would be desirable to provide the students with a functional 
programming environment. This would provide a first hand look at the fundamental dif- 
ference in methodologies when programming in functional languages as opposed to 
programming in traditional imperative languages. 

Of the languages currently supported in the department; LISP, on the UNIX 1 
environment, comes the closest to meeting this requirement. Although LISP is considered 
a functional language by some, its many extensions and modifications actually brings it into 
the world of imperative programming. It is not a pure functional programming language. 

There are several additional problems associated with using LISP to teach techniques 
of functional programming. Modem LISP dialects do not support all aspects of functional 
programming. Most notably they lack the ability to define higher-order functions. 
Dynamic scoping and the semantics of the language make it a pedagogical nightmare to 
teach. [Ref. l:p. 0-1] The goal of teaching functional programming would rapidly be 
overtaken by the necessity of explaining the idiosyncrasies of LISP. In an 11 week 



is a trademark of Bell Laboratories. 
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quarter, time devoted to LISP would significantly detract from instruction of functional 
programming. 

Recognizing the shortcomings of LISP, a pure functional language, PHI was 
developed by Dr. B. J. MacLennan for use in this course of instruction. The syntax of 
PHI closely follows that of standard mathematical notation. This means students should 
have little difficulty in learning how to write legitimate PHI statements. Instruction can 
now concentrate on joining these statements to create functional programs. Hopefully, this 
will lead to a greater understanding and appreciation of the methodology of functional 
programming. 

B. BACKGROUND — THESIS 

Creation of PHI solved the problem of finding a suitable language to use to 
demonstrate the methodology of functional programming. However, currently PHI 
programs are programs on paper only. There exists no programming environment for the 
PHI language. So it is impossible to machine execute PHI programs. This thesis attempts 
to remedy the above problem by providing the first component in a PHI programming 
environment — a prototype PHI compiler. 

Conventional compiler construction techniques were chosen for this implementation 
for several reasons. By choosing conventional techniques, the authors were able to 
address the problems associated with utilizing conventional methods for implementing a 
compiler for a functional language 2 . Additionally, realizing that both the language and 
system would change, the authors wanted a well documented and understood 
methodology. The cost of maintaining a system can be as much as three times the 
development cost [Ref. 2:p. 478]. Therefore, it was imperative to choose a methodology 
that supported a clean and structured design. 

2 Specific problems and solutions are covered later in Chapters Two and Three 
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Following conventional methodologies, the authors separated the PHI compiler design 
into a front-end 3 and a back-end 4 . The overall general design of the PHI compiler is 
shown in Figure 1.1. The front-end, containing the scanner (lexical analyzer) and parser 
(syntactic analyzer) is essentially responsible for analysis of the external file containing the 
source program. The PHI compiler back-end couples semantic analysis with code 
generation to produce code suitable for execution on the target machine. [Ref. 3:pp. 5-6] 
The authors felt that a clear and distinct separation between parts would aid understanding 
of the system, simplify division of labor, and increase ease of development and 
maintenance. It should also result in greater flexibility for follow-on development in the 
PHI programming environment. As an example, the current front-end could be modified 
to support a PHI interpreter. 




Figure 1.1 General Design of the PHI Compiler 



C. BACKGROUND — FUNCTIONAL PROGRAMMING 

Functional programming is a methodology in favor among academicians. Although 
applicative programming goes further back, it is generally agreed that, as a methodology, 
functional programming traces its roots to John Backus [Ref. 4:p. 404, Ref. 5:p. 65]. In 

3 Design and implementation of the front-end is discussed in Chapter Two. 

4 Design and implementation of the back-end is discussed in Chapter Three. 
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his acceptance speech for the 1977 ACM Turing Award, Backus criticized traditional 
programming languages and programming styles. He went on to propose a new 
methodology of programming that involved "the use of a fixed set of combining forms 
called functional forms." [Ref. 6:p. 619] This methodology is known today as functional 
programming. 

1. Problems with Conventional Languages 

Backus feels [Ref. 6:pp. 613-619] that the basic underlying problem with 
conventional languages is the existence of the assignment statement. The assignment 
statement plays a central role in conventional languages and breaks programming into two 
worlds. Backus calls the right-hand side of assignment statements, expressions, the first 
of these worlds. The second world is the world of statements, with the primary statement, 
of course, being the assignment statement. 

Several problems are associated with assignment statements. First, they permit 
programs to be held hostage through access to their variables. Since variables are used to 
imitate the machine’s storage cells; assignment statements allow, even encourage, state 
changes to take place. This access, either direct or indirect, permits such problems as side 
effects, unintentional state changes, and aliasing to arise. It then becomes difficult to 
reason about the correctness of these programs, so proving simple programs correct is an 
arduous task and proving complex programs correct is virtually impossible. Additionally, 
by permitting the value of variables to be changed, the assignment statement makes 
temporal order of execution of statements critical. For example, the following two pieces 
of code produce dramatically different results depending on which statement inside the for 
loop is executed first. 

for (i = 0; i != some_value; ++i) for (i = 0; i ! = some_value; ++i) 



if( i % 2 == 0) 



DoSomething(i); 
if( i % 2 == 0); 



continue; 

DoSomething(i); 



continue; 
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These problems interact so that it becomes extremely difficult to create new programs out of 
old ones. [Ref. 6:pp. 613 - 619, Ref. l:pp. 1-2 - 1-20] 

Another problem associated with assignment statements is that each produces only 
a one-word result. In effect, they force programmers to think in a word-at-a-time 
manner. For example, to apply a function to an entire array of values, the programmer 
must access each value individually. Not only is this wasteful of computer assets, but it 
results in what Backus refers to as the "von Neumann bottleneck" of conventional 
programming languages. [Ref. 6:pp. 613-619] 

2. Functional Languages 

Backus proposes the methodology of functional programming as the solution to 
these problems. Functional languages have removed variables and the assignment 
statement from their syntax so that their basic building block becomes the function. It is 
through "the use of a fixed set of combining forms... plus simple definitions" [Ref. 6:p. 
619] that the programmer is able to build new functions from existing functions. It thus 
becomes possible to form a new program by combining two or more existing programs or 
functions together. 

The absence of assignment statements and variables removes the problems 
plaguing conventional languages caused by side effects, etc. because the program now 
operates exclusively in the world of expressions. This permits the programmer to maintain 
a clear conceptual view of the program. It is easier to understand and reason about the task 
the program is to perform [Ref. 5:pp. 65 - 69]. It now becomes not only possible, but 
practical to prove programs correct [Ref.6:pp. 624 - 625]. 

Another direct benefit stemming from the absence of side effects is order. The 
values of expressions are no longer dependent on the order in which they are evaluated. 
Therefore, functional languages provide a natural means of performing parallel 
computations [Ref. 7:p. 35]. Functional languages and the associated methodology of 
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functional programming may very well provide the key to programming the massively 
parallel computers entering service nowadays. All of the above benefits have applicability 
to ongoing research in the SDI program. 

The authors feel that functional programming can best be summarized by the 
following thought — assignment statements are to functional programming what GOTO 
statements are to structured programming. 

D. ASSUMPTIONS 

An IBM 5 personal computer/IBM compatible personal computer was chosen as the 
target machine for this implementation. The authors felt that the nature of the language and 
its intended use were better suited for the PC/personal work station environment as 
opposed to a mini- or main-frame time shared environment. The PC environment should 
provide greater flexibility and freedom when implementing follow-on tools for the PHI 
programming language. Also, future compiler improvements will not have to be concerned 
with extraneous interfaces to another system. Working with a PC environment eliminates 
the need to take into account the effects the PHI environment will have on another user of 
the system. The implementor is able to work with a system that remains constant — a 
known quantity. 

The assumed target machine configuration is based on the equipment available in the 
Naval Postgraduate School's computer science microcomputer lab. Each machine is 
configured with 640K bytes of RAM, one (most have two) 20M byte hard disk drive, one 
1.2M byte 5 inch floppy disk drive, and the 8087 math co-processor; each currently 
operates under the MS-DOS 6 3.x operating system. These machines are readily available 
to all computer science students at the Naval Postgraduate School, and many students own 

5 IBM is a registered trademark of Internal Business Machines Corporation. 

6 MS-DOS is a registered trademark of Microsoft Corporation. 



11 



personal computers with similar configurations. It is not necessary to utilize a hard disk 
when executing the PHI compiler. 

E. CONSTRAINTS 

As is the case with most implementation theses, time was probably the biggest 
constraint facing the authors. This involved making certain trade-offs; e.g. should the 
major effort be directed towards a full implementation of PHI while concentrating on a 
particular component of the compiler, or should the major effort be directed towards a full 
implementation of the compiler while concentrating on a subset of the PHI language? The 
authors felt that the greatest benefit could be gained by implementing a complete compiler. 
Having to actually face the issues and problems associated with designing, implementing, 
and interfacing a full compiler implementation would be much different than just reading 
about them in a text. As a result, this thesis implements only a subset 7 of PHI. 

Since PHI is an experimental language it is still undergoing changes and revisions. 
Trying to modify and update the compiler design with each version proved to be an 
impossibility. The authors were forced to freeze the design based on the language as it 
stood on 07 January 1987. Any follow-on work will need to update the front-end and 
back-end of the compiler to meet the requirements of these new versions of PHI. A 
description of the grammar as implemented and a description of the latest version of the 
grammar may be found in the Appendixes. 



7 This subset is discussed in the individual chapters on the front-end and back-end. 



n. FRONT-END OF THE COMPILER 



The authors separated the design of the PHI compiler into two modules, a front-end 
and a back-end. These modules were then further subdivided to produce the general layout 
of Figure 1.1. The authors believe this modularization simplifies the design and will aid in 
understanding the system, thus decreasing future maintenance problems. 

The front-end of the PHI compiler is comprised of the scanner (lexical analyzer), the 
parser (syntactic analyzer), and their associated error recovery routines. Two possible 
interactions between the lexical and syntactic analyzers were considered. The first 
incorporates the scanner into the parser, and tokens are produced by the scanner only upon 
request of the syntactic analyzer. Thus, this system acts like a pipeline. An alternate 
method is to allow the scanner to tokenize the entire source program, store the tokens in 
some data structure, and pass this structure to the parser. [Ref. 3:p. 10] 

For the prototype implementation of a PHI compiler, the authors based the design on 
the first interaction. Although the second method is conceptually very easy to understand, 
the authors think the current implementation is clean and will readily lend itself to future 
enhancements. Any input alphabet peculiarities are restricted to the lexical analyzer, and 
this independence should provide benefits for the next student(s) who work on the PHI 
programming environment. 

A. LEXICAL ANALYSIS — THE SCANNER 

The PHI compiler reads a source file of ASCII text which is fed to the scanner for 
lexical analysis. The principle task of lexical analysis is to separate or divide the source 
program into tokens for use during syntactic analysis [Ref.8:p. 84, Ref. 9:p. 155]. This 
is accomplished in the PHI compiler through a character-by-character examination of the 
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user's source file. These characters are assembled/grouped into the individual tokens 
which represent terminal symbols of the PHI grammar. Examples of some of the terminal 
symbols are operators, identifiers, keywords, and constants. A complete listing of the PHI 
tokens may be found in the header file for the scanner in Appendix E. 

The primary advantage to tokenizing the source program is that the design of the 
syntactic analyzer needs to take into account only one type of data unit — the token [Ref. 
3:p. 7]. This simplifies the design of the parser because provisions do not have to be 
made for handling white space and comments. The scanner has already removed them. 
Also, removing white space and comments and utilizing a fixed-length representation for 
the tokens saves space. Once tokenization is complete, the source program can be 
discarded and the compacted tokenized file can be utilized for further analysis. 

In order to correctly tokenize the source file there must be some discrete means 
available to accurately represent each token. There are several ways of describing tokens. 
One means available is to use a regular grammar. In this method "generative rules are 
given for producing the desired tokens" [Ref. 3:p. 142]. An equivalent, but different, 
method is to use finite-state acceptors, FSAs, to recognize tokens. The authors found it 
easier to visualize this as a recognitive vice generative problem. For this reason the various 
tokens were modeled using FSAs. An example of an unsigned number recognizer is 
shown in Figure 2.1. The interested reader is directed to Tremblay and Sorenson [Ref. 
3:Chapter 4] for an excellent introduction to the practice of using FSAs to model tokens. 
The authors found that utilizing FSAs greatly simplified the design, coding, and debugging 
of the lexical analyzer — one picture was worth a hundred lines of code. 

The ideal grammar would allow each token to be uniquely and unambiguously 
identified. Once the lexical analyzer started on the path of building a token, it would be 
able to continue until the end with no backtracking. Due to limitations with the standard 
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Figure 2. 1 Unsigned Number Recognizer 



ASCII character set, the designer of PHI used multiple keystrokes, or characters, to 
represent various operators in the language 8 . This resulted in compound token types. 
Also, as in other programming languages, PHI overloads certain operators, allowing them 
to do double duty 9 by taking on different context-dependent meanings. 

The problem of dealing with compound token types was easily handled through the 
use of a single lookahead character. For example, upon finding the character the 
scanner looks ahead to the next character to see if it is ">" ( — ») or another (--). If the 
next character is neither of these two, it indicates that the token is just the simple token 
Distinguishing overloaded operators was solved by essentially ignoring it in the scanner! 
The authors took the position this is basically a syntax analyzer problem and there was no 
reason to complicate the scanner by handling it. The scanner just identifies a generic token 
type, e.g. SUB_, and lets the parser make the proper determination of its true meaning, e.g. 
SUB_ or NEG_. 

There are several design decisions relating to the lexical analyzer worth noting. The 
authors, following the example of Pascal, C, and other languages, took the position that 



8 Some examples of this are -> for — == for s and <> for *. 

^For example, + and - can serve as either an unary or binary arithmetic operator. 
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